Lightweight and Accurate Silent Data Corruption Detection in Ordinary Di erential Equation Solvers
نویسندگان
چکیده
Silent data corruptions (SDCs) are errors that corrupt the system or falsify results while remaining unnoticed by firmware or operating systems. In numerical integration solvers, SDCs that impact the accuracy of the solver are considered significant. Detecting SDCs in high-performance computing is necessary because results need to be trustworthy and the increase of the number and complexity of components in emerging large-scale architectures makes SDCs more likely to occur. Until recently, SDC detection methods consisted in replicating the processes of the execution or in using checksums (for example algorithm-based fault tolerance). Recently, new detection methods have been proposed relying on mathematical properties of numerical kernels or performing data analysis of the results modified by the application. None of those methods, however, provide a lightweight solution guaranteeing that all significant SDCs are detected. We propose a new method called Hot Rod as a solution to this problem. It checks and potentially corrects the data produced by numerical integration solvers. Our theoretical model shows that all significant SDCs can be detected. We present two detectors and conduct experiments on streamline integration from the WRF meteorology application. Compared with the algorithmic detection methods, the accuracy of our first detector is increased by 52% with a similar false detection rate. The second detector has a false detection rate one order of magnitude lower than these detection methods while improving the detection accuracy by 23%. The computational overhead is lower than 5% in both cases. The model has been developed for an explicit Runge-Kutta method, although it can be generalized to other solvers.
منابع مشابه
Lightweight and Accurate Silent Data Corruption Detection in Ordinary Differential Equation Solvers
Silent data corruptions (SDCs) are errors that corrupt the system or falsify results while remaining unnoticed by firmwares or operating systems. In numerical integration solvers, SDCs that impact the accuracy of the solver are considered significant. Detecting SDCs in high-performance computing is necessary because results need to be trustworthy and the increase of the number and complexity of...
متن کاملExploring Partial Replication to Improve Lightweight Silent Data Corruption Detection for HPC Applications
Silent data corruption (SDC) poses a great challenge for high-performance computing (HPC) applications as we move to extremescale systems. If not dealt with properly, SDC has the potential to influence important scientific results, leading scientists to wrong conclusions. In previous work, our detector was able to detect SDC in HPC applications to a certain level by using the peculiarities of t...
متن کاملDiscrete-time Solutions to the Continuous-time Differential Lyapunov Equation With Applications to Kalman Filtering, Report no. LiTH-ISY-R-3055
Prediction and ltering of continuous-time stochastic processes require a solver of a continuous-time di erential Lyapunov equation (cdle). Even though this can be recast into an ordinary di erential equation (ode), where standard solvers can be applied, the dominating approach in Kalman lter applications is to discretize the system and then apply the discrete-time di erence Lyapunov equation (d...
متن کاملRational Heuristics for Rational Solutions of Riccati Equations
We describe some new algorithm and heuristics for computing the polynomial and rational solutions of bounded degree of a class of ordinary di erential equations, which includes generalized Riccati equations. As a consequence, our methods can be used for factoring linear ordinary di erential equations. Since they generate systems of algebraic equations in at most n unknowns, where n is the order...
متن کاملTECHNISCHE UNIVERSITÄT BERLIN Analysis and Reformulation of Linear Delay Di erential-Algebraic Equations
In this paper, we study general linear systems of delay di erential-algebraic equations (DDAEs) of arbitrary order. We show that under some consistency conditions, every linear high-order DAE can be reformulated as an underlying high-order ordinary di erential equation (ODE) and that every linear DDAE with single delay can be reformulated as a high-order delay di erential equation (DDE). We der...
متن کامل